Design and evaluation of prosodically-sensitive concatenative units for a Korean TTS system

نویسنده

  • Kyuchul Yoon
چکیده

This paper describes the design and evaluation of prosodically-sensitive concatenative units for a Korean text-to-speech (TTS) synthesis system. The diphones used are prosodically conditioned in the sense that a single conventional diphone is stored as different versions taken directly from the different prosodic domains of the prosodically labeled, read sentences. The four levels of the Korean prosodic hierarchy were observed in the diphone selection process, thereby selecting four different versions of each diphone: three edge diphones from the prosodic domains of the intonational phrase (IP), accentual phrase (AP) and prosodic word (PW), and a non-edge diphone from the domain of the prosodic word. Due to the size of the corpus that we employed, our system covers only 36.4% of the 6,503 possible diphones. A listening experiment designed to evaluate the quality of the diphone database showed that listeners preferred stimuli composed of prosodically appropriate diphones. We interpret this as supporting the view that segments carry prosodic domain information.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

طراحی و ارزیابی یک مدل بازسازی گفتار به روش هم‌گذاری واحدهای حساس به بافت نوایی

This paper describes the design and evaluation of prosodically-sensitive concatenative units for a Persian text-to-speech (TTS) synthesis system. Thesyllables used are prosodically conditioned in the sense that a single conventional syllable is stored as different versions taken directly from the different prosodic domains of the prosodically labeled, read sentences. The three levels of the Per...

متن کامل

A concatenative Mandarin TTS system without prosody model and prosody modification

This paper proposes a two-step solution for generating natural prosody in TTS, in which no prosody prediction and modification are needed. A large phonetically and prosodically enriched speech corpus has been collected as the unit pool for the synthesizer. A multi-tier non-uniform unit selection scheme is developed to pick up the most suitable segments for concatenation from the unit pool. Fina...

متن کامل

Perceptually based automatic prosody labeling and prosodically enriched unit selection improve concatenative text-to-speech synthesis

Prosody is an important factor in the quality of text-tospeech (TTS) synthesis. Typically, acoustic parameters such as f0 and duration are the only variables related to prosody that are used to determine unit selection. Our study explored adding the explicit use of linguistically and perceptually motivated prosodic categories in unit selection-based TTS. One of our goals was to automate the pro...

متن کامل

Data pruning using confidence measures for concatenative synthesis system built using automatically transcribed audio

Today, we can record and store large amounts of single speaker audio data, and also download it from the web. Generally, these data are prosodically rich and can therefore act as excellent candidates for building concatenative text-to-speech (TTS) systems. But transcritpions for these audio data are often not available and automatic transcriptions are error prone. In addition, these audio data ...

متن کامل

Vocalic sandwich, a unit designed for unit selection TTS

Unit selection text-to-speech systems currently produce very natural synthetic sentences by concatenating speech segments from a large database. Recently, increasing demand for designing high quality voices with less data creates need for further optimization of the textual corpus recorded by the speaker. The optimization process of this corpus is traditionally guided by the coverage rate of we...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Speech & Language

دوره 22  شماره 

صفحات  -

تاریخ انتشار 2008